Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function offline_run! to write data during run! #815

Merged
merged 43 commits into from
Jul 10, 2023

Conversation

mastrof
Copy link
Contributor

@mastrof mastrof commented Jun 3, 2023

Replaces #630

@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2023

Codecov Report

Merging #815 (28c75b0) into main (210e3ac) will increase coverage by 0.26%.
The diff coverage is 80.28%.

@@            Coverage Diff             @@
##             main     #815      +/-   ##
==========================================
+ Coverage   69.76%   70.02%   +0.26%     
==========================================
  Files          41       42       +1     
  Lines        2636     2706      +70     
==========================================
+ Hits         1839     1895      +56     
- Misses        797      811      +14     
Impacted Files Coverage Δ
...AgentsOSMVisualizations/AgentsOSMVisualizations.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/abmplot.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/convenience.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/daisyworld_def.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/inspection.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/interaction.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/lifting.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/model_observable.jl 0.00% <ø> (ø)
ext/AgentsVisualizations/src/utils.jl 0.00% <ø> (ø)
src/submodules/io/AgentsIO.jl 100.00% <ø> (ø)
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@Datseris Datseris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CSV the best thing to use here? Aren't there any other binary-based data formats that can be used instead, and loaded directly into a dataframe...?

Project.toml Outdated Show resolved Hide resolved
src/simulations/collect.jl Outdated Show resolved Hide resolved
src/simulations/collect.jl Outdated Show resolved Hide resolved
src/simulations/collect.jl Outdated Show resolved Hide resolved
src/simulations/collect.jl Outdated Show resolved Hide resolved
@mastrof
Copy link
Contributor Author

mastrof commented Jun 4, 2023

Is CSV the best thing to use here? Aren't there any other binary-based data formats that can be used instead, and loaded directly into a dataframe...?

Is the problem with CSV "only" that it is not binary, or something else?
At the moment I don't know an easy way to append to file with e.g. JLD2. I believe you would have to reload the dataframe to modify it in place and save it again, which would make the whole thing useless.. Maybe there is a way, but I don't know it. I will look into some of the larger simulation packages (e.g. Oceananigans comes to mind), maybe someone already found a solution to this.

@Datseris
Copy link
Member

Datseris commented Jun 4, 2023

I am surprised that DataFrames.jl does not provide a binary file format that can also be read and written to line-by-line. Seems like the data science people would have figured this out by now.

@Datseris
Copy link
Member

Datseris commented Jun 4, 2023

Yes the problem with CSV is "only" that it is non-binary, but this is a rather big problem. If you are using this functionality, you probably have a lot of data, and to my knowledge text-based ways to save numerical data are the most inefficient in terms of memory. So the disk space you will fill with this will be very large.

@mastrof
Copy link
Contributor Author

mastrof commented Jun 10, 2023

Haven't taken time to look for an alternate solution to CSV yet.
But if we agree on the current logic then it will be just a matter of substituting CSV.write with a different write function once an appropriate candidate is found.

Copy link
Member

@Datseris Datseris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We fully agree on the logic. DOn't forget to:

  • increment minor version in project.toml
  • add changelog entry udner new minor version
  • add the new function in the documentation (API page). I guess it should go under the Save. Load, Checpoints section
  • Mention the function in run! by saying "see also [offline_run!](@ref)

It would be great if we could have a more disk-efficient backend to save. But CSV files are nice as well and some users may prefer them for other reasons. I propose to al;ready anticipate the possibility of different backends and provide a keyword argument that configures what backend should be used.

@mastrof
Copy link
Contributor Author

mastrof commented Jun 11, 2023

I propose to al;ready anticipate the possibility of different backends and provide a keyword argument that configures what backend should be used.

Would something like this work, at the beginning of offline_run!? Other backends could then be included by adding some elseifs, but I have a feeling this might be very bad performance-wise.

write_to_file(filename, data, append) = if lowercase(backend) == "csv"
    AgentsIO.CSV.write(filename, data; append=append)
else
    throw(ArgumentError("Backend $(backend) not supported."))
end

Would this be improved by defining a "maker" function?

function make_writer(backend)
    s = lowercase(backend)
    if s == "csv"
        return (filename, data, append) -> AgentsIO.CSV.write(filename, data; append=append)
    elseif s == "mybackend"
        return (filename, data, append) -> MyBackend.write(filename, data, append)
    else 
        throw(ArgumentError("Backend $(backend) not supported."))
    end
end

and then inside of offline_run! have a line write_to_file = make_writer(backend)?

@Datseris
Copy link
Member

creating a maker function is indeed a better idea. However you should also introduce a function barrier: after you initialize all data frames and the maker function, pass all of these into another function. This establishes type stability where it matters.

Copy link
Member

@Datseris Datseris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be merged unless conflicts are resolved (rebase wiht main). Please don't forget teh function barrier I am sure it will have a positive impact on performance.

src/simulations/collect.jl Outdated Show resolved Hide resolved
@fbanning
Copy link
Member

Arrow backend is implemented now. In the process I've condensed the functions a bit. Also added tests for Arrow functionality and they passed locally on my machine.

The workflow for adding a new backend would now be:

  • Add writer_$backend function taking filename, data, append as inputs.
  • Add backend to get_writer function.
  • Done (hopefully). :)

@fbanning
Copy link
Member

https://github.com/JuliaDynamics/Agents.jl/actions/runs/5333264438/jobs/9663511968?pr=815#step:7:357

Can anyone test this on Windows? I don't have a Windows machine.

src/simulations/collect.jl Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
Copy link
Member

@fbanning fbanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this change doesn't make any sense. When the file doesn't exist or when we want to overwrite an existing file (i.e. !append), then we need to pass file = false to the write function. Please revert it.

@Tortar
Copy link
Member

Tortar commented Jul 10, 2023

yes, you are right, it was a last, really desperate, attempt to solve the problem, don't know why but my Windows didn't complain locally for this :D, anyway I tried to dig up a bit for sometime and I still have no clue why Windows denies the possibility to remove the file

@fbanning
Copy link
Member

@Tortar Did you run the tests locally on your windows machine and they all passed? If so, then we can be sure that this is a CI windows problem.

@Tortar
Copy link
Member

Tortar commented Jul 10, 2023

no I didn't explain myself well, they passed when I changed the lines with the non sensical approach :D (but this is probably because I also changed something else), they don't pass locally on my Windows machine, the file mdata.arrow can't be removed, but I didn't find any fix

@fbanning
Copy link
Member

Ah sorry, misunderstood you then. Hm, really weird that it only fails on Windows but has no problems on Linux and MacOS. No idea how to approach this myself, as well, sorry. :/

@Datseris
Copy link
Member

Sorry for being away but I am overwhelmed with other projects and won't have much time for Agents.jl for the next semester or so... I'll be giving comments if you explicitly as for them by tagging me @Tortar or @fbanning but will have notifications off for the next semester :(

Now, to finish this: if I understand correclty the status quo is that this works fine, but doesn't work on windows, yes? If so, can we then add an error to the first line of the function that checks the OS and prints and error, and open an issue that this function doesn't seem to work as expect in windows with Arrow? In the test suite we only run the test if OS is not windows. Then we can merge? Because everything else seems fine.

@fbanning
Copy link
Member

If so, can we then add an error to the first line of the function that checks the OS and prints and error, and open an issue that this function doesn't seem to work as expect in windows with Arrow? In the test suite we only run the test if OS is not windows.

Sure thing. I'll add some clauses for that.

Then we can merge? Because everything else seems fine.

I'll do that right after pushing the required OS handling changes and after the tests finally pass.

@Datseris Datseris merged commit 6902229 into JuliaDynamics:main Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants